- Design, implement, and manage cloud-based or on-premise infrastructure for AI/M.L workflows.
- Automate infrastructure provisioning, configuration, and scaling using tools like Terraform, kubernete, or equivalent.
- Optimize compute resources, ensuring cost-effective and efficient use for training and inference.
- Develop and maintain Continuous Integration/Continuous Deployment (CI/CD) pipelines tailored for AI/M.L workflows.
- Automate the testing, validation, and deployment of AI models into production.
- Integrate with monitoring tools to ensure pipeline reliability and performance.
- Deploy machine learning models to production environments using containerization (e.g., Docker) and orchestration platforms (e.g., Kubernetes).
- Implement model monitoring for drift detection, performance, and real-time insights.
- Collaborate with data scientists to establish A/B testing and versioning strategies.
- Work closely with data scientists, M.L engineers, and software developers to streamline workflows.
- Build tools and frameworks that accelerate experimentation and productionalization of AI systems.
- Ensure data and model security by implementing robust access controls, encryption, and compliance with industry regulations.
- Manage secure handling of sensitive data used in AI/M.L training and testing.
- Identify bottlenecks in AI/M.L workflows and optimize system performance.
- Diagnose and resolve issues in the deployment pipeline or production environment.
- Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field.
- New graduate is welcome
- Familiar with the fundamentals of artificial intelligence and machine learning.
- Ideally have (but not required) Hands-on experience with open-source AI models
- Proven fast learning capability.
- Proficiency in scripting and automation languages or AI tools (Python, Bash, Jupiter Notebook, Golang, etc.).
- Familiarity with one of following: AI models (hugging face, Claude, etc.), M.L frameworks (TensorFlow, PyTorch, etc.), deployment tools (e.g., M.Lflow, Seldon, or TFX)
- Solid understanding of computer algorithms, AI training, inference, and AI powered use cases
- Good to have infrastructure as code (IaC) tools like Terraform or Ansible.
- Strong problem-solving and analytical skills.
- Excellent communication and teamwork abilities.
- Adaptability to a fast-paced, evolving environment.
- Location: Must be Santa Clara, CA-based.
- ARTIFICIAL INTELLIGENCE
- MACHINE LEARNING
- PRODUCTION ENVIRONMENT
- PERFORMANCE OPTIMIZATION
- SHELL SCRIPTING
- PROBLEM-SOLVING
- ENCRYPTION
- USE CASES
- PROVISIONING
- SCRIPTING
- DATA SCIENCE
- REAL-TIME
- CONTINUOUS INTEGRATION/DELIVERY
- PYTORCH
- DOCKER
- TERRAFORM
- CONTAINERIZATION
- DEV OPS
- OPERATIONS
- PYTHON
- KUBERNETES
- Minimum Degree Required: Bachelor's Degree
- Certifications & Licenses:
- Proficiency in scripting and automation languages or AI tools (Python, Bash, Jupiter Notebook
Company
Location
Santa Clara, California - United States of America
Job type
Full-Time
Golang Job Details
Job Title: AI Devops Engineer
Location: Santa Clara, CA 95054 (Hybrid Role)
Duration: 12+ Months Contract Role + Extension
Pay Range: $40/hr to $45/hr
Job Description:
Position Overview: We are seeking an innovative and highly motivated AI DevOps Engineer to join our dynamic team. In this role, you will bridge between development and operations in AI-focused projects, ensuring the seamless deployment, scalability, and reliability of AI and machine learning (ML) applications. You will design and manage the infrastructure, tools, and pipelines that enable data scientists and AI engineers to efficiently develop, train, and deploy models.
In addition, you will leverage external and internal AI tools to automate from prompts to execution.
Key Responsibilities:
Infrastructure Automation and Management:
Familiarize with the fundamentals of artificial intelligence:
Model Deployment and Monitoring:
Collaboration with Cross-functional Teams:
Security and Compliance:
Performance Optimization and Troubleshooting:
Education:
Experience:
Technical Skills:
Soft Skills:
Required Skills:
Additional Skills:
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
More Developer Job Boards
Fullstack Developer Jobs Golang Jobs JavaScript Jobs Python Jobs React Jobs Rust Jobs Java Jobs